Use of neural network mapping and extended kalman filter to recover vocal tract resonances from the MFCC parameters of speech
نویسندگان
چکیده
In this paper, we present a state-space formulation of a neuralnetwork-based hidden dynamic model of speech whose parameters are trained using an approximate EM algorithm. The training makes use of the results of an off-the-shelf formant tracker (during the vowel segments) to simplify the complex sufficient statistics that would be required in the exact EM algorithm. The trained model, consisting of the state equation for the target-directed vocal tract resonance (VTR) dynamics on all classes of speech sounds (including consonant closure) and the observation equation for mapping from the VTR to acoustic measurement, is then used to recover the unobserved VTR based on Extended Kalman Filter. The results demonstrate accurate estimation of the VTRs, especially those during rapic consonant-vowel or vowel-consonant transitions and during consonant closure when the acoustic measurement alone provides weak or no information to infer the VTR values.
منابع مشابه
A state-space model with neural-network prediction for recovering vocal tract resonances in fluent speech from Mel-cepstral coefficients
In this paper, we present a state-space formulation of a neural-network-based hidden dynamic model of speech whose parameters are trained using an approximate EM algorithm. This efficient and effective training makes use of the output of an off-the-shelf formant tracker (for the vowel segments of the speech signal), in addition to the Mel-cepstral observations, to simplify the complex sufficien...
متن کاملRecovering vocal tract shapes from MFCC parameters
Recovering vocal tract shapes from the speech signal is a well known inversion problem of transformation from the articulatory system to speech acoustics. Most of the studies on this problem in the past have been focused on vowels. There have not been general methods e ective for recovering the vocal tract shapes from the speech signal for all classes of speech sounds. In this paper we describe...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملSensorless Speed Control of Double Star Induction Machine With Five Level DTC Exploiting Neural Network and Extended Kalman Filter
This article presents a sensorless five level DTC control based on neural networks using Extended Kalman Filter (EKF) applied to Double Star Induction Machine (DSIM). The application of the DTC control brings a very interesting solution to the problems of robustness and dynamics. However, this control has some drawbacks such as the uncontrolled of the switching frequency and the strong ripple t...
متن کاملMotion detection by a moving observer using Kalman filter and neural network in soccer robot
In many autonomous mobile applications, robots must be capable of analyzing motion of moving objects in their environment. Duringmovement of robot the quality of images is affected by quakes of camera which cause high errors in image processing outputs. In thispaper, we propose a novel method to effectively overcome this problem using Neural Networks and Kalman Filtering theory. Thistechnique u...
متن کامل